library(here)
library(tidyverse)
library(gauntlet)
library(leaflet)
library(leaflet.extras2)
library(mapview)
library(sf)
library(SpatialKDE)
library(sfhotspot)
location = "data/req_dev"
folder = 'data_20230117_092037'
crs = 32610
bb_sa_layer = read_sf(here(location, 'study_area_network.shp'))
replica_trip_origin_links_file = here(location, folder, "replica_trip_origin_links.gpkg")
replica_trip_origin_links = read_sf(replica_trip_origin_links_file)
replica_trip_origin_links_fltrd = replica_trip_origin_links %>%
st_filter(bb_sa_layer) %>%
st_transform(crs = 32610)
This document is a case study of how the data acquired from the Replica data platform was used to perfrom a freight analysis in the DC area.
This vignette shows how to run a Kernel Density Estimate (KDE) and Hotspot analysis using the data retrieved using replicaToolkitR package.
The goal of this document is not to present a complete analysis but to showcase ease in which to run
NOTE: This vignette assumes you have already acquired replica data via replicaToolkitR using the replicaToolkitR::query_network_trip_using_bbox() function.
The data used in this analysis was acquired using the study area bounding box seen below.
Note: Any trip travelling though this zone is
queried via replicaToolKitR. This is done to account for all vehicles
traveling on study area network links -
i.e. _internal-to-internal_,
_internal-to-external_,
_external-to-internal_, and
_external-to-external_ trips.
However, replicaToolKitR filters for trips that only originate within the study area polygon when displaying or aggregating anything regarding trip origins.
The below map details trip origins as counts aggregated by each trips
first network link. Links in this case are displayed as points
determined by starting coordinates. This data is loaded from the
**replica_trip_origin_links.gpkg** file.
The SpatialKDE and sfhotspot packages are
used to perform the KDE analysis. The latter is built upon the former
and provides some convenience wrappers if you don’t want to get into get
into the nitty-gritty of some of the function options. Both are used in
this vignette.
I first create a grid that covers the spread of my data. I choose a cell size of 200 meters as it is roughly the size of a city blocks in Seattle and a good unit for this analysis.
grid_sm = create_grid_hexagonal(
replica_trip_origin_links_fltrd
,cell_size = 200)
The water around the city is then removed using the
tigris package. This is an important step, as it make the
distinction between a location that does not generate any trips and one
incapable of generating trips - i.e. freight trips cannot start in the
middle of Lake Washington.
grid_sm_water_removed = tigris::erase_water(grid_sm) %>%
st_cast("POLYGON")
Both grids are displayed in the below map. The grid with removed water features is seen in the right panel:
As you can see it is not perfect; Green Lake, Union Bay and other water features have not been removed. In addition, this idea can be extended to other locations incapable of generating trips such as parks, universities, airport tarmac, etc.
The R package mapedit provides functions that allow the
user to easily draw and create their own polygons can be used to remove
non-trip generating similar to the water removal above. This vignette
does not cover this.
The below code runs the **hotspot_kde()** function for
different vehicle types in the data and using the different grids
previously created. The **purrr::pmap()** function allows
us to iterate over multiple arguments simultaneously resulting in a list
object (and saves us a bunch of key strokes!).
hotspot_obeject = list(
list("MEDIUM_COMMERCIAL", "MEDIUM_COMMERCIAL", "HEAVY_COMMERCIAL", "HEAVY_COMMERCIAL")
,list(grid_sm_water_removed, grid_sm, grid_sm_water_removed, grid_sm)
) %>%
pmap(~{
temp_hotspot = replica_trip_origin_links_fltrd %>%
filter(vehicle_type == .x) %>%
hotspot_kde(
data = .
,grid = .y
,weights = count
,bandwidth = 800) %>%
mutate(kde_norm = gauntlet::normalize_min_max(kde)) %>%
mutate(label = str_glue("KDE (Min-Max Norm): {dgt2(kde_norm)}<br>KDE (raw): {dgt2(kde)}<br>TTL Trips: {sum}")) %>%
st_transform(crs = 4326)
})
The resulting 2D kernel density estimates for trip origins of heavy (left panel) and medium (right panel) duty vehicles are displayed below:
The results are pretty interesting - their is a rather dramatic difference between where heavy and medium duty trucks go in Seattle. Heavy duty trucks stick to the ports, where medium duty tricks are more distributed through out Seattle (they’re smaller and can get around easier, makes sense!).
This analysis can be extended by using hotspot_gistar() function, which allows us to easily calculate the gi-star Z-score statistic used in identifying clusters of point locations. The theory behind this statistic will not be covered here but should be reviewed before implementing it.
temp_hotspot = replica_trip_origin_links_fltrd %>%
filter(vehicle_type == "MEDIUM_COMMERCIAL") %>%
hotspot_gistar(
data = .
,grid = grid_sm_water_removed
,weights = count
,bandwidth = 800)
The map below depicts the result of calculating the gi-star Z-score statistic. Statistically significant hot and cold locations are displayed:
temp_hotspot %>%
filter(pvalue <= .01) %>%
mutate(flag_gistar = case_when(gistar > 0~"Hot", T~"Cold")) %>%
mapview(layer.name = "Hotspots", zcol = "flag_gistar")
By just using the default input options, we get some pretty interesting findings. We see rather large cold spots in Green Lake, Union Bay, Discovery Park (west of Magnolia) - this makes sense as the first two locations are bodies of water and the third is a large wooded park where freight trucks don’t go. In comparison to their neighbors, these locations are especially cold regions for trucks and this identified. One could take these findings and remove these areas in subsequent analysis iterations.
This was a very simple vignette intended to highlight a specific workflow stemming from data acquired from the replicaToolkitR package.
For a more detailed analysis, one should consider: